Goto

Collaborating Authors

 probabilistic classifier


Supplementary for Neural Methods for Point-wise Dependency Estimation

Neural Information Processing Systems

In this section, we shall show detailed derivations for the point-wise dependency estimation methods. Four approaches are discussed: Variational Bounds of Mutual Information, Density Matching, Probabilistic Classifier, and Density-Ratio Fitting. For convenience, we define Ω = X Y. We have PX,Y and PXPY (can also be written as PX PY) be the probability measures over σ algebras over Ω with their probability densities being the Radon-Nikodym derivatives (i.e., p(x,y) = dPX,Y/dµ and p(x)p(y) = dPXPY/dµwith µbeing the Lebesgue measure). These estimators have the logarithm of point-wise dependency (PMI) as the intermediate product, which we will show in the following. We denote Mbe any class of functions m: Ω R. Proposition 1 (INWJ and its neural estimation, restating Nguyen-Wainwright-Jordan bound [5, 18]).






Supplementaryfor NeuralMethodsforPoint-wiseDependencyEstimation

Neural Information Processing Systems

Four approaches are discussed: Variational Bounds of Mutual Information, Density Matching, ProbabilisticClassifier,andDensity-RatioFitting. Proposition3(IJS and its neural estimation, restating Jensen-Shannon bound with f-GAN objective [22]). We adopt the "concatenate critic" design [20, 22, 23] for our neural network parametrized function. NotethatProbabilistic Classifier method applies sigmoid function to the outputs to ensure probabilistic outputs. To proceed, it suffices if we could provide an upper bound forPrS(|lS(θk)| ε/2).


NeuralMethodsforPoint-wiseDependencyEstimation

Neural Information Processing Systems

Sinceitsinception, theneuralestimation ofmutualinformation (MI)hasdemonstrated the empirical success of modeling expected dependency between highdimensional random variables.


On the Role of Randomization in Adversarially Robust Classification

Neural Information Processing Systems

Deep neural networks are known to be vulnerable to small adversarial perturbations in test data. To defend against adversarial attacks, probabilistic classifiers have been proposed as an alternative to deterministic ones. However, literature has conflicting findings on the effectiveness of probabilistic classifiers in comparison to deterministic ones. In this paper, we clarify the role of randomization in building adversarially robust classifiers.Given a base hypothesis set of deterministic classifiers, we show the conditions under which a randomized ensemble outperforms the hypothesis set in adversarial risk, extending previous results.Additionally, we show that for any probabilistic binary classifier (including randomized ensembles), there exists a deterministic classifier that outperforms it. Finally, we give an explicit description of the deterministic hypothesis set that contains such a deterministic classifier for many types of commonly used probabilistic classifiers, randomized ensembles and parametric/input noise injection.


Minimum-Risk Recalibration of Classifiers

Neural Information Processing Systems

Recalibrating probabilistic classifiers is vital for enhancing the reliability and accuracy of predictive models. Despite the development of numerous recalibration algorithms, there is still a lack of a comprehensive theory that integrates calibration and sharpness (which is essential for maintaining predictive power). In this paper, we introduce the concept of minimum-risk recalibration within the framework of mean-squared-error (MSE) decomposition, offering a principled approach for evaluating and recalibrating probabilistic classifiers. Using this framework, we analyze the uniform-mass binning (UMB) recalibration method and establish a finite-sample risk upper bound of order $\tilde{O}(B/n + 1/B^2)$ where $B$ is the number of bins and $n$ is the sample size. By balancing calibration and sharpness, we further determine that the optimal number of bins for UMB scales with $n^{1/3}$, resulting in a risk bound of approximately $O(n^{-2/3})$. Additionally, we tackle the challenge of label shift by proposing a two-stage approach that adjusts the recalibration function using limited labeled data from the target domain. Our results show that transferring a calibrated classifier requires significantly fewer target samples compared to recalibrating from scratch. We validate our theoretical findings through numerical simulations, which confirm the tightness of the proposed bounds, the optimal number of bins, and the effectiveness of label shift adaptation.


A Consistent and Differentiable Lp Canonical Calibration Error Estimator

Neural Information Processing Systems

Calibrated probabilistic classifiers are models whose predicted probabilities can directly be interpreted as uncertainty estimates. It has been shown recently that deep neural networks are poorly calibrated and tend to output overconfident predictions. As a remedy, we propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates, which asymptotically converges to the true $L_p$ calibration error. This novel estimator enables us to tackle the strongest notion of multiclass calibration, called canonical (or distribution) calibration, while other common calibration methods are tractable only for top-label and marginal calibration.